On the Computation of Distances for Probabilistic Context-Free Grammars

نویسندگان

Colin de la Higuera

James Scicluna

Mark-Jan Nederhof

چکیده

Probabilistic context-free grammars (PCFGs) are used to define distributions over strings, and are powerful modelling tools in a number of areas, including natural language processing, software engineering, model checking, bio-informatics, and pattern recognition. A common important question is that of comparing the distributions generated or modelled by these grammars: this is done through checking language equivalence and computing distances. Two PCFGs are language equivalent if every string has identical probability with both grammars. This also means that the distance (whichever norm is used) is null. It is known that the language equivalence problem is interreducible with that of multiple ambiguity for context-free grammars, a long-standing open question. In this work, we prove that computing distances corresponds to solving undecidable questions: this is the case for the L1, L2 norm, the variation distance and the Kullback-Leibler divergence. Two more results are less negative: 1. The most probable string can be computed, and, 2. The Chebyshev distance (where the distance between two distributions is the maximum difference of probabilities over all strings) is interreducible with the language equivalence problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

Intersection for Weighted Formalisms

The paradigm of parsing as intersection has been used throughout the literature to obtain elegant and general solutions to numerous problems involving grammars and automata. The paradigm has its origins in (Bar-Hillel et al., 1964), where a general construction was used to prove closure of context-free languages under intersection with regular languages. It was pointed out by (Lang, 1994) that ...

متن کامل

Proceedings of the 9 th International Workshop Finite State Methods and Natural Language Processing

متن کامل

Prefix Probability for Probabilistic Synchronous Context-Free Grammars

We present a method for the computation of prefix probabilities for synchronous contextfree grammars. Our framework is fairly general and relies on the combination of a simple, novel grammar transformation and standard techniques to bring grammars into normal forms.

متن کامل

Query Parsing Using Probabilistic Tree Grammars

The tree representation, using rhythm for defining the tree structure and pitch information for node labeling has proven to be effective in melodic similarity computation. In this paper we propose a solution representing melodies by tree grammars. For that, we infer a probabilistic context-free grammars for the melodies in a database, using their tree coding (with duration and pitch) and classi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1407.1513 شماره

صفحات -

تاریخ انتشار 2014

On the Computation of Distances for Probabilistic Context-Free Grammars

نویسندگان

چکیده

منابع مشابه

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

Intersection for Weighted Formalisms

Proceedings of the 9 th International Workshop Finite State Methods and Natural Language Processing

Prefix Probability for Probabilistic Synchronous Context-Free Grammars

Query Parsing Using Probabilistic Tree Grammars

عنوان ژورنال:

اشتراک گذاری